Speaker detection using multi-speaker audio files for both enrollment and test
نویسندگان
چکیده
This paper focuses on speaker detection using multispeaker files both for the enrollment phase and for the test phase. This task was introduced during the 2002 NIST speaker recognition evaluation campaign. Enrollment data is composed of three two-speaker files. Test files are also two-speaker records. The system presented here uses a speaker segmentation process based on an HMM conversation model followed by a speaker matching technique to produce one-speaker segments. Speaker detection is then achieved using AMIRAL, LIA's GMMbased speaker verification system. Validation of the proposed strategy is done using extracts from the NIST 2002 results.
منابع مشابه
The Speakers in the Wild Speaker Recognition Challenge Plan
The Speakers in the Wild (SITW) speaker recognition challenge (SRC) is intended to support research toward the real-world application of automatic speaker recognition technology across speech acquired in unconstrained conditions. The SITW SRC will serve to benchmark current technologies in both single and multi-speaker audio with the dataset and annotations being made publicly available (under ...
متن کاملA semi-automatic approach for speaker mining of tapped telephone conversations
Speaker mining involves speaker detection in a set of multispeaker files. In previous work on speaker mining, training data is used for constructing target speaker models. In this study, a new speaker mining scenario was considered, where there is no demarcation between training and testing data and prior target speaker models are absent. Given the ENRON database which consists of tapped teleph...
متن کاملAcoustic hole filling for sparse enrollment data using a cohort universal corpus for speaker recognition.
In this study, the problem of sparse enrollment data for in-set versus out-of-set speaker recognition is addressed. The challenge here is that both the training speaker data (5 s) and test material (2~6 s) is of limited test duration. The limited enrollment data result in a sparse acoustic model space for the desired speaker model. The focus of this study is on filling these acoustic holes by h...
متن کاملRobust Voice Mining Techniques for Telephone Conversations
Title of thesis: ROBUST VOICE MINING TECHNIQUES FOR TELEPHONE CONVERSATIONS Sandeep Manocha, Master of Science, 2006 Thesis directed by: Dr. Carol Y. Espy-Wilson Department of Electrical Engineering Voice mining involves speaker detection in a set of multi-speaker files. In published work, training data is used for constructing target speaker models. In this study, a new voice mining scenario w...
متن کاملPSO Based Optimized Reliability for Robust Multimodal Speaker Identification
Speaker recognition in real environment with reliable mode is a key challenge for ubiquitous service in human computer interface. In this paper, we present a robust multimodal speaker identification system with optimized reliability of different modalities. We propose an extension of modified convection function’s optimizing factors to account optimum reliability simultaneously in audio, face a...
متن کامل